Search CORE

76 research outputs found

Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features:Notebook for PAN at CLEF 2019

Author: Bartelds Martijn
de Vries Wietse
Publication venue
Publication date: 23/07/2019
Field of study

Dissertations of the University of Groningen

La acreditación mexicana desde una perspectiva comparativa

Author: Wietse De Vries
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 01/08/2007
Field of study

Mexico formally set up a system of program and institutional accreditation in 2000. The workings of this system are, at first sight, very similar to processes in place in the European Union and other countries, but several contextual factors produce results that are markedly different from those in other places. From a comparative perspective, this article points out that the implementation of an accreditation system not only depends on technical decisions, but that several political, legal and cultural conditions can seriously hamper the system in practice

Directory of Open Access Journals

DUMB: A Benchmark for Smart Evaluation of Dutch Models

Author: de Vries Wietse
Nissim Malvina
Wieling Martijn
Publication venue
Publication date: 13/10/2023
Field of study

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of nine tasks includes four tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of language models to a strong baseline which can be referred to in the future even when assessing different sets of language models. Through a comparison of 14 pre-trained language models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In addition to highlighting best strategies for training larger Dutch models, DUMB will foster further research on Dutch. A public leaderboard is available at https://dumbench.nl.Comment: EMNLP 2023 camera-read

arXiv.org e-Print Archive

Make the Best of Cross-lingual Transfer:Evidence from POS Tagging with over 100 Languages

Author: de Vries Wietse
Nissim Malvina
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Dissertations of the University of Groningen

Make the Best of Cross-lingual Transfer:Evidence from POS Tagging with over 100 Languages

Author: de Vries Wietse
Nissim Malvina
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Cross-lingual transfer learning with large multilingual pre-trained models can be an effective approach for low-resource languages with no labeled training data. Existing evaluations of zero-shot cross-lingual generalisability of large pre-trained models use datasets with English training data, and test data in a selection of target languages. We explore a more extensive transfer learning setup with 65 different source languages and 105 target languages for part-of-speech tagging. Through our analysis, we show that pre-training of both source and target language, as well as matching language families, writing systems, word order systems, and lexical-phonetic distance significantly impact cross-lingual performance. The findings described in this paper can be used as indicators of which factors are important for effective zero-shot cross-lingual transfer to zero- and low-resource languages

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Make the Best of Cross-lingual Transfer:Evidence from POS Tagging with over 100 Languages

Author: de Vries Wietse
Nissim Malvina
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Proceedings - University of Groningen

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Author: de Vries Wietse
Nissim Malvina
van Cranenburgh Andreas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Dissertations of the University of Groningen

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Author: de Vries Wietse
Nissim Malvina
van Cranenburgh Andreas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Author: de Vries Wietse
Nissim Malvina
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2020
Field of study

Experiments with transfer learning on pre-trained language models such as BERT have shown that the layers of these models resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers of the network. We investigate to what extent these results also hold for a language other than English. For this we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, by considering the task of part-of-speech tagging in more detail, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations and it is therefore useful to combine information from different layers for best results, instead of selecting a single layer based on the best overall performance

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen